Preference Queries* 



Jan Chomicki 
Dept. of Computer Science and Engineering 
University at Buffalo 
Buffalo, NY 14260-2000 
chomicki@cse . buffalo . edu 

1st February 2008 



Abstract 

The handling of user preferences is becoming an increasingly important issue in 
present-day information systems. Among others, preferences are used for information 
filtering and extraction to reduce the volume of data presented to the user. They are 
also used to keep track of user profiles and formulate policies to improve and automate 
decision making. 

We propose here a simple, logical framework for formulating preferences as preference 
formulas. The framework does not impose any restrictions on the preference relations 
and allows arbitrary operation and predicate signatures in preference formulas. It also 
makes the composition of preference relations straightforward. We propose a simple, 
natural embedding of preference formulas into relational algebra (and SQL) through a 
single winnow operator parameterized by a preference formula. The embedding makes 
possible the formulation of complex preference queries, e.g., involving aggregation, by 
piggybacking on existing SQL constructs. It also leads in a natural way to the defi- 
nition of further, preference-related concepts like ranking. Finally, we present general 
algebraic laws governing the winnow operator and its interaction with other relational 
algebra operators. The preconditions on the applicability of the laws are captured by 
logical formulas. The laws provide a formal foundation for the algebraic optimization of 
preference queries. We demonstrate the usefulness of our approach through numerous 
examples. 



1 Introduction 

The handling of user preferences is becoming an increasingly important issue in present- 
day information systems. Among others, preferences are used for information filtering and 
extraction to reduce the volume of data presented to the user. They are also used to keep 
track of user profiles and formulate policies to improve and automate decision making. 



"This is an expanded version of the paper |S|. CoRR paper :s. DB/0207094 
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The research literature on preferences is extensive. It encompasses preference logics 
[29, ^5], 18], preference reasoning [30, 28, ||, prioritized nonmonotonic reasoning and logic 
programming || 11, 27] and decision theory |Ti^, |l3[] (the list is by no means exhaustive). 
However, only a few papers very recent, address the 

issue of user preferences in the context of database queries. Two different approaches are 
pursued: qualitative and quantitative. In the qualitative approach [24, [|, |l^, 



22,E1, the 



preferences between tuples in the answer to a query are specified directly, typically using 
binary preference relations. 

Example 1.1 We introduce here one of the examples used throughout the paper. Consider 
the relation Book(ISBN, Vendor, Price) and the following preference relation >-\ between 
Book tuples: 

prefer one Book tuple to another if and only if their ISBNs are the same and 
the Price of the first is lower. 

Consider the following instance r\ of Book 



ISBN 


Vendor 


Price 


0679726691 


BooksForLess 


$14.75 


0679726691 


LowestPrices 


$13.50 


0679726691 


QualityBooks 


$18.80 


0062059041 


BooksForLess 


$7.30 


0374 164770 


LowestPrices 


$21.88 



Then clearly the second tuple is preferred to the first one which in turn is preferred to 
the third one. There is no preference defined between any of those three tuples and the 
remaining tuples. 

In the quantitative approach [Q, p|, preferences are specified indirectly using scoring func- 
tions that associate a numeric score with every tuple of the query answer. Then a tuple t\ 
is preferred to a tuple t% iff the score of t\ is higher than the score of £2- The qualitative 
approach is strictly more general than the quantitative one, since one can define preference 
relations in terms of scoring functions (if the latter are explicitly given), while not every 
intuitively plausible preference relation can be captured by scoring functions. 

Example 1.2 There is no scoring function that captures the preference relation described 



in Example \1.1\ . Since there is no preference defined between any of the first three tuples and 
the fourth one, the score of the fourth tuple should be equal to all of the scores of the first 
three tuples. But this implies that the scores of the first three tuples are the same, which is 
not possible since the second tuple is preferred to the first one which in turn is preferred to 
the third one. 

This lack of expressiveness of the quantitative approach is well known in utility theory 
0,0- 
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In the present paper, we contribute to the qualitative approach by defining a logical 
framework for formulating preferences and its embedding into relational query languages. 

We believe that combining preferences with queries is very natural and useful. The 
applications in which user preferences are prominent will benefit from applying the mod- 
ern database technology. For example, in decision-making applications databases may be 
used to store the space of possible configurations. Also, the use of a full-fledged query 
language makes it possible to formulate complex decision problems, a feature missing from 
most previous, non-database, approaches to preferences. For example, the formulation of 
the problem may now involve quantifiers, grouping, or aggregation. At the same time by 
explicitly addressing the technical issues involved in querying with preferences present-day 
DBMS may expand their scope. 

The framework presented in this paper consists of two parts: a formal first-order logic 
notation for specifying preferences and an embedding of preferences into relational query 
languages. In this way both abstract properties of preferences (like asymmetry or transi- 
tivity) and evaluation of preference queries can be studied to a large degree separately. 

Preferences are defined using binary preference relations between tuples. Preference 
relations are specified using first-order formulas. We focus mostly on intrinsic preference 
formulas. Such formulas can refer only to built-in predicates. In that way we capture 
preferences that are based only on the values occuring in tuples, not on other properties 
like membership of tuples in database relations. We show how the latter kind of preferences, 
called extrinsic, can also be simulated in our framework in some cases. 

We propose a new relational algebra operator called winnow that selects from its argu- 
ment relation the most preferred tuples according to the given preference relation. Although 
the winnow operator can be expressed using other operators of relational algebra, by con- 
sidering it on its own we can on one hand focus on the abstract properties of preference 
relations (e.g., transitivity) and on the other, study special evaluation and optimization 
techniques for the winnow operator itself. For SQL. we are faced with a similar choice: 
either the language is appropriately extended with an SQL equivalent of winnow, or the 
occurrences of winnow are translated into SQL. The first alternative looks more promising; 
however, in this paper we don't commit ourselves to any specific syntactic expression of 
winnow in SQL. 

We want to capture many different varieties of preference and related notions: uncon- 
ditional vs. conditional preferences, nested and hierarchical preferences, groupwise prefer- 
ences, indifference, iterated preferences and ranking, and integrity constraints and vetoes. 

The main contributions of this paper are as follows: 

1. a simple, logical framework for formulating preferences as preference formulas. The 
framework does not impose any restrictions on the preference relations and allows 
arbitrary operation and predicate signatures in preference formulas. It also makes the 
composition of preference relations straightforward. 

2. a simple, natural embedding of preference formulas into relational algebra (and SQL) 
through a single winnow operator parameterized by a preference formula. The em- 
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bedding makes possible the formulation of complex preference queries, e.g., involving 
aggregation, by piggybacking on existing SQL constructs. It also leads in a natural 
way to the definition of further, preference-related concepts like ranking. 

3. general algebraic laws governing the winnow operator and its interaction with other 
relational algebra operators. The preconditions on the applicability of the laws are 
captured by logical formulas. The laws provide a formal foundation for the algebraic 
optimization of preference queries. 

In Section ^, we define the basic concepts of preference relation, preference formula, and 
the winnow operator. We also introduce several examples that will be used throughout the 
paper. In Section |I| we study the basic properties of preference relations. In Section |3], 
which contains the main technical contributions of the paper, we present the main properties 
of the winnow operator, characterize its expressive power, and outline - for completeness - a 
number of evaluation algorithms that were proposed elsewhere. In Section ||, we explore the 
composition of preferences. In Section || we show how the winnow operator together with 
other constructs of relational algebra and SQL makes it possible to express a wide variety 
of preference queries. In Section fj], we show how iterating the winnow operator provides a 
ranking of tuples and introduce a weak version of the winnow operator that is helpful for 
preference relations that are not strict partial orders. We discuss related work in Section || 
and conclude with a brief discussion of further work in Section ||. All the non-trivial proofs 
are given. 

2 Basic notions 

We are working in the context of the relational model of data. We assume two infinite 
domains: D (uninterpreted constants) and N (numbers). We do not distinguish between 
different numeric domains, since it is not necessary for the present paper. When necessary, 
we assume that database instances are finite. (Some results hold without the finiteness 
assumption.) Additionally, we have the standard built-in predicates. In the paper, we will 
move freely between relational algebra and SQL. 

2.1 Basic definitions 

Preference formulas are used to define binary preference relations. 

Definition 2.1 Given a relation schema R{A\ ■ ■ ■ A/.) such that Ui, 1 < i < k, is the 
domain (either D or N ) of the attribute Ai, a relation y is a preference relation over R if 
it is a subset of (U\ x • • • x Uf.) x (U\ x • • • x Uk). 

Intuitively, >~ will be a binary relation between pairs of tuples from the same (database) 
relation. We say that a tuple t\ dominates a tuple t% in >- if t\ y- £2- 
Typical properties of the relation y include: 
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• irreflexivity: Vx.x )f x, 

• asymmetry: Vx, y.x y y =>• y )f- x, 

• transitivity: Vx, y, z.(x y y A y y z) =4> x y z, 

• negative transitivity: Vx, y, z.(x y" y Ay )f- z) =>■ x / 2, 

• connectivity: Vx, y.x yy\/yyx\/x = y. 

The relation >- is: a strict partial order if it is irreflexive, asymmetric and transitive; a 
f of aZ order if it is a connected strict partial order; a weaA; order if it is is a negatively transitive 
strict partial order. At this point, we do not assume any properties of y, although in most 
applications it will satisfy at least the properties of a strict partial order. 

Definition 2.2 A preference formula (pf) C(fi,f2) is a first- order formula defining a pref- 
erence relation >~c in the standard sense, namely 

h y c t 2 iff C(fi,f 2 ). 

An intrinsic preference formula (ipf) is a preference formula that uses only built-in predi- 
cates. 

We will limit our attention to preference relations defined using preference formulas. 
By using the notation yc f° r a preference relation, we assume that there is an underlying 
preference formula C. 

Ipfs can refer to equality (=) and inequality (7^) when comparing values that are un- 
interpreted constants, and to the standard set of built-in arithmetic comparison operators 
when referring to numeric values (there are no function symbols). We will call an ipf that 
references only arithmetic comparisons (=, 7^, <, >, <, >) pure comparison. Without loss of 
generality, we will assume that ipfs are in DNF (Disjunctive Normal Form) and quantifier- 
free (the theories involving the above predicates admit quantifier elimination). A formula 
in DNF is called Zc-DNF if it has at most k disjuncts. 

In this paper, we mostly restrict ourselves to ipfs and preference relations defined by 
such formulas. The main reason is that ipfs define fixed, although possibly infinite, relations. 
As a result, they are computationally easier and more amenable to syntactic manipulation 
that general pfs. For instance, transitively closing an ipf results in a finite formula (Theorem 
|5.3| ), which is typically not the case for pfs. However, we formulate in full generality the 
results that hold for arbitrary pfs. 

We define now an algebraic operator that picks from a given relation the set of the most 
preferred tuples, according to a given preference formula. 

Definition 2.3 If R is a relation schema and C a preference formula defining a preference 
relation yc over R, then the winnow operator is written as u>c(R), and for every instance 
r of R: 

uc(r) ={ter\ -at' e r. tl y c t}. 

A preference query is a relational algebra query containing at least one occurrence of 
the winnow operator. 
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2.2 Examples 

The first example illustrates how preference queries are applied to information extraction: 
here obtaining the best price of a given book. 



Example 2.1 Consider the relation Book (ISBN, Vendor, Price) from Example 1.1. The 
preference relation yc\ from this example can be defined using the formula C\: 



(i,v,p) y Cl (i',v',p) 



i A p < p' . 



The answer to the preference query ujCi(Book) provides for every book the information 
about the vendors offering the lowest price for that book. For the given instance n of Book, 
applying the winnow operator u)q x returns the tuples 



ISBN 


Vendor 


Price 


0679726691 
0062059041 
0374164770 


LowestPrices 

BooksForLess 

LowestPrices 


$13.50 

$7.30 

$21.88 



Note that in the above example, the preferences are applied groupwise: separately for each 
book. Note also that due to the properties of <, the preference relation >~Ci is irreflexive, 
asymmetric and transitive. 



The second example illustrates how preference queries are used in automated decision 
making to obtain the most desirable solution to a (very simple) configuration problem. 

Example 2.2 Consider two relations Wine(Name,Type) and Dish(Name,Type) and a 
view Meal that contains possible meal configurations 

CREATE VIEW Meal (Dish.DishType , Wine, WineType) AS 
SELECT * FROM Wine, Dish; 

Now the preference for white wine in the presence of fish and for red wine in the presence 
of meat can be expressed as the following preference formula C2 over Meal: 

(d, dt, w, wt) y C ' 2 (d 1 , dt', w', wt') = (d = d' A dt = 'fish' Awt = 'white' 

Adt' = 'fish' A wt' = 'red') 
V(d = d' A dt = 'meat' Awt = 'red' 
Adt' = 'meat' A wt' = 'white') 

Notice that this will force any white wine to be preferred over any red wine for fish, and 
just the opposite for meat. For other kinds of dishes, no preference is indicated. This is 
an example of a relative preference. Consider now the preference query ujc 2 [MeaV). It will 
pick the most preferred meals, according to the above-stated preferences. Notice that in the 
absence of any white wine, red wine can be selected for fish. 
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The above preferences are conditional, since they depend on the type of the dish being 
considered. Note that the relation >-c 2 i> n this example is irreflexive and asymmetric. Tran- 
sitivity is obtained trivially because the chains of are of length at most 2. Note also 
that the preference relation is defined without referring to any domain order. 

Note also that the meals with a wine which is neither red nor white but, e.g., rose, are 
not related through >~c 2 to the meals with either of those kinds of wine. Therefore, the 
preference query ooc 2 {Meal) will return also the meals involving such wines, as they are not 
dominated by other meals. If this is undesirable, one can express an absolute preference for 
white wine for fish (and red wine for meat) using the formula C3: 

(d, dt, w, wt) y C3 (d', dt', w', wt') = (d = d' A dt = 'fish' Awt = 'white' 

Adt = 'fish' A wt' ^ 'white') 
V(d = d' A dt = 'meat' Awt = 'red' 
Adt' = 'meat' A wt' ^ 'red') 

Similarly, an unconditional preference for red wine for any kind of meal can also be defined 
as a first-order formula C4: 

(d, dt, w, wt) y Ci (d', dt', w',wt') = d = d' Awt = 'red' A wt' ^ 'red'. 

3 Properties of preference queries 
3.1 Preference relations 

Since pfs can be essentially arbitrary formulas, no properties of preference relations can be 
assumed. So our framework is entirely neutral in this respect. 

In the examples above, the preference relations were strict partial orders. This is likely 
to be the case for most applications of preference queries. However, there are cases where 
such relations fail to satisfy one of the properties of partial orders. We will see in Section 
^ when irreflexivity fails. For asymmetry: We may have two tuples t\ and £2 such that 
t\ y t2 and t2 >- t\ simply because we may have one reason to prefer t\ over £2 and another 
reason to prefer t^ over t\. Similarly, transitivity is not always guaranteed 
For example, t\ may be preferred over £2 and ti over £3, but the gap between t\ and £3 with 
respect to some heretofore ignored property may be so large as to prevent preferring t\ over 
£3. Or, transitivity may have to be abandoned to prevent cycles in preferences. However, 
transitivity is essential for the correctness of the algorithms that compute winnow (Section 

!)■ 

It is not difficult to check the properties of a preference relation defined using a pure 
comparison ipf. 

Theorem 3.1 If a preference relation is defined using a pure comparison ipf in DNF, it 
can be checked in PTIME for irreflexivity and asymmetry. If the ipf is also in k-DNF for 
some fixed k, then the preference relation can be checked in PTIME for transitivity, negative 
transitivity, and connectivity. 



20, H, 12, 18]. 
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Proof: We discuss first asymmetry, the remaining properties can be handled in a similar 
way. If t\ y i 2 is defined as D\ V . . . V D m and i 2 >- t\ as D[ V . . . V D' m , we can write down 
the negation of asymmetry as {D\ V . . .V D m ) A (D[ V . . .VD' m ). This formula is satisfiable iff 
at least one of m 2 formulas faj = D{ A D'p i, j = 1, . . . ,m, is satisfiable. Each formula 0jj 
is a conjunction of atomic formulas involving arithmetic comparison predicates. Thus its 



satisfiability can be checked in PTIME using the methods of [17]. Testing for transitivity, 
negative transitivity and connectivity requires writing down the negation of a DNF formula 
and distributing the negation inside. The restriction to fc-DNF guarantees that we have 
again a polynomial number of PTIME satisfiability problems. rj 

Theorem 3.2 If a preference relation yc over R is a strict partial order, then for every 
finite, nonempty instance r of R, toc{ r ) is nonempty. 



If the properties of strict partial orders are not satisfied, then Theorem |3.2| may fail to 
hold and the winnow operator may return an empty set, even though the relation to which 
it is applied is nonempty. For instance, if ro = {to} an d to y to (violation of irreflexivity) , 
then the winnow operator applied to ro returns an empty set. Similarly, if two tuples are 
involved in a violation of asymmetry, they may block each other from appearing in the result 
of the winnow operator. Also, if the relation r is infinite, it may happen that uJc(r) = 0, 
for example if r contains all natural numbers and the preference relation is the standard 
ordering >. 

The winnow operator is not monotone or anti-monotone. 
Example 3.1 Consider the following preference formula Cq: 

x yc 6 y = x = a Ay = b. 

Then 

& = "<*({&}) g wc 6 ({«,&}) = a. 
Thus monotonicity and anti-monotonicity fail. 

However, a form of monotonicity with respect to the preference formula parameter holds 
for winnow. 

Theorem 3.3 // >~Ci an d ^c 2 are preference relations over a relation schema R, and the 
formula 

Vtl,t 2 [Ci(ii,t2)=^C 2 (tl ) *2)] 

is valid, then for all instances r of R, o>c 2 ( r ) ^ w Ci( r )- V ^Ci an d are strict partial 
orders, then the converse also holds. 

Proof: The first part is obvious. To see that the second part also holds, assume that for 
all relations r, u;c 2 (r) Cj ^Cii r ) but C\ 7^ C 2 . Thus, C\ A -1C2 is satisfiable, and there are 
two tuples t± and £ 2 such t\ >~a *2 but t\ )fc 2 *2- Consider now the instance ri 2 = {ti,t 2 }. 
Then wci( r i 2 ) = {*i} but t 2 E 0Jc 2 { r i2), a contradiction. rj 
Several properties of winnow follow directly from the definition (the first is listed in p2| , 
although in a less general context): 
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Proposition 3.1 For every preference relations >~Ci an d ~>~C2 over a relation schema R 
and every instance r of R : 



WC1VC2 (r) = uj Ci (r) n loc 2 (r) 
UFalse(r) = r 
UTrue{r) = 0- 



3.2 Indifference 

There is a natural notion of indifference associated with our approach: two tuples t\ and t% 
are indifferent (t\ ~c* ^2) if neither is preferred to the other one, i.e., t\ )/~c ti and t% )f-c t\. 



Proposition 3.2 For every preference relation >~c, every relation r and every tuple t\,t2 £ 
uc{r), we have t\ = t 2 or t\ ~ c t 2 . 



It is a well-known result in decision theory 12, Q| that in order for a preference relation 
to be representable using scoring functions the relation has to be a weak order. This 
implies, in particular, that the corresponding indifference relation (defined as above) has to 
be transitive. This is not the case for the preference relation >~d defined in Example |1 . 1| . 



4 The winnow operator 

In this section, we study various properties of the winnow operator: expressive power, 
monotonicity, commutativity and distributivity. Formulating such properties is essential 
for the evaluation and optimization of preference queries. We also briefly discuss some 
evaluation methods for winnow. 

Although, as we show, the winnow operator can be expressed in relational algebra, its 
explicit use makes possible a clean separation of preference formulas from other aspects of 
the query. This has several advantages. First, the properties of preference relations can be 
studied in an abstract way. Second, specialized query evaluation methods for the winnow 
operator can be developed. Third, algebraic properties of that operator can be formulated, 
in order to be used in query optimization. 



4.1 Expressive power 

The winnow operator can be expressed in relational algebra, and thus does not add any 
expressive power to it. Perhaps more surprisingly, winnow can be used to simulate set 
difference. 



Theorem 4.1 Relational algebra with winnow replacing set difference has the same expres- 
sive power as standard relational algebra. 
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Proof: Clearly, the winnow operator is first-order definable. Thus any relational algebra 
query with winnow can be translated to relational calculus, and then back to relational al- 
gebra (without winnow). Such a construction is, however, mainly of theoretical importance. 

From a practical point of view, we show now the translation of the winnow operator 
wc(r) for C = D\ V . . . V D k which is a pure comparison ipf formula in DNF. Each D^, 
i = 1, . . . , k, is a formula over free variables t\ and t 2 . It can be viewed as a conjunction 
D t = cpi Aipi A 7j where (pi refers only to the variables of t\, ipi to the variables of t 2 , and 73 
to the variables of both tx and t 2 - The formula (pi has an obvious translation to a selection 
condition <£j over R, and the formula tpi a similar translation to a selection condition 
over g(R), where g is a renaming of R. The formula 7^ can similarly be translated to a join 
condition Tj over R and g(R). Then 

k 

uc{R) = q-\q{R) - k (rMM r ) ™ °*MR))))) 

i=i 

where g~ l is the inverse of the renaming g. 

We show now how to simulate the set difference operator R — S using winnow. Assume 
that R (and S) have the set of attributes X of arity k. Then 

R-S = itx{ctbM"C 5 (Rx{1}USx{0}))) 

where B is the last attribute of R x {1} and 

(21, ... ,x k ,b) y Cs (x[, ■ ■■ ,x' k ,b') = xx = x[ A • • • Ax k = x' k A b = A b' = 1. 

This works as follows. Think of the attribute B as a tag. All the tuples in R (resp. 
S) are tagged with 1 (resp. 0). If a tuple is in R n S, then there are two copies of it in 
R x {1} U S x {0}: one tagged with 1, the other with 0. The latter one is preferred according 
to >~c b - Finally, the selection crg^o eliminates all the tuples in S, keeping the tuples that 
are only in R. rj 

4.2 Evaluating winnow 

For completeness, we show here several algorithms that can be used to compute the result 
of the winnow operator u>c(r). The first is a simple nested- loops algorithm (Figure |l|). The 
second is BNL, an algorithm proposed in [Q] in the context of skyline queries, a specific 
class of preference queries, but the algorithm is considerably more general (Figure ||). The 
third H is a variant of the second, in which a presorting step is used (Figure |3|). All the 
algorithms used a fixed amount of main memory (a window). However, for the algorithm 
NL, this is not made explicit, since it is irrelevant for the properties of the algorithm that 
are of interest here. Our emphasis is not on the algorithms themselves - they are much more 
completely described and analyzed in the original papers - but rather on determining their 
scope. We will identify the classes of preference queries to which each of them is applicable. 
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1. 


open a scan Si on r; 


2. 


for every tuple t\ returned by S±: 




(a) open a scan S2 on r; 




(b) for every tuple returned by Si: 




if t 2 >~c ti, then close S2 and goto ^cj; 




(c) output ti; 




(d) close 52; 


3. 


close Si. 




Figure 1: NL: Nested Loops 



1. 


initialize the window W and the temporary table F to empty; 


2. 


make r the input; 


3. 


repeat the following until the input is empty: 




(a) for every tuple t : 




• t is dominated by a tuple in W =>■ ignore t, 




• t dominates some tuples in W => eliminate the dominated tuples 




and insert t into W, 




• t is incomparable with all tuples in W =>■ insert t into W (if 




there is room), otherwise add t to F; 




(b) output the tuples from W that were added there when F was empty, 




(c) make F the input, clear F. 




Figure 2: BNL: Blocked Nested Loops 



The NL algorithm is correct for any preference relation >-q. In principle, the preference 
relation might even be reflexive, since the algorithm compares a tuple with itself. The BNL 
and SFS algorithms require the preference relation to be a strict partial order (for BNL 
this is noted in 0). The algorithms require irreflexivity, because they do not compare a 
tuple with itself. Neither do they handle correctly symmetry: the situation where there 
are two tuples t\ and ti such that t\ >~c £2 and ti >-c t\. In this case, BNL will break 
the tie depending on the order in which the tuples appear, and SFS will fail altogether, 
being unable to produce a topological sort. To see the necessity of transitivity, consider the 
following example. 

Example 4.1 The 'preference relation Cq is defined as follows: 

x ^~c Q y = x = a/\y = b\/x = bf\y = c. 
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1. 


topologically sort r according to >c', 


9 


m a ur> f 1~ n o mnnt' 
HldrvL- / 111L. 111JJLLI. 


3. 


initialize the window W and the temporary table F to empty; 


4. 


repeat the following until the input is empty: 




(a) for every tuple t in the input: 




• t is dominated by a tuple in W => ignore t, 




• t is incomparable with all tuples in W => insert t into W (if 




there is room), otherwise add t to F; 




(b) output the tuples from W. 




(c) make F the input, clear F. 




Figure 3: SFS: Sort-Filter-Skyline 



Now let us suppose that the window has room for only one tuple, and the tuples arrive in 
the following order: a, h, c. Then a will be in the window, and b will be discarded, which 
prevents b from blocking c. Therefore, BNL will output a (correctly) and c (incorrectly). 
Such an example can be easily generalized to any fixed window size, simply be assuming that 
a and b are separated in the input by sufficiently many values different from a, b and c. 

4.3 Algebraic laws 

We present here a set of algebraic laws that govern the commutativity and distributivity 
of winnow w.r.t. relational algebra operators. This set constitutes a formal foundation for 
rewriting preference queries using the standard strategies like pushing selections down. We 
prove the soundness of the introduced laws. In the cases of selection, projection, union 
and difference, we show that the preconditions on the applicability of the laws are not only 
sufficient but also necessary. In the remaining cases, we show that the violations of the 
preconditions lead to the violations of the laws. In most interesting cases, the preconditions 
can also be efficiently checked. 

We adopt the set-based view of relational algebra operators and leave exploring the 
multiset-based view for future research. 

4.3.1 Commutativity of winnow 

We establish here a sufficient condition for winnow to be commutative. Commutativity 
is a fundamental property that makes it possible to move the winnow operator around in 
preference queries. 

Theorem 4.2 If C\ and C2 are preference formulas over a schema R such that 
• the formula Vii, t2[Ci(ti, £2) CM^i, £2)] is valid, and 
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• >~d and >~c 2 are strict partial orders, 
then for all finite instances r of R: 

WGi(wc 3 W) = uctniwdir)) = uJ C2 (r). 

Proof: We prove here the first equality; the second can be proved in a similar way. 

Assume t ^ u>c 2 (^c 1 ( r )) an d t G wcj(wc 2 (r)). Then also t G u>c 2 (r). There are two 
possibilities: (1) 3t' G uic\(r) such that if >~c 2 t. But then t' G r, which contradicts the fact 



that t G uJc 2 {r)- (2) t ^ uc^r). But then by Theorem t ^ wc 2 (r), a contradiction. 

Assume t ^ (wc 2 ( r )) an d i G wc^wc^f)). Then also t G w^i (?"*)• There are two 
possibilities: (1) 3t' G uJc 2 (r) such that t' >~C\ t- But then also t' G r, which contradicts 
the fact that t G toci( r )- (2) * ^ ^c 2 ( r )- Still t & r, since otherwise i ^ wp^r). Therefore, 
3i' G r such that t' >~c 2 t. Now because ^c" 2 is a strict partial order and r is finite, we can 
choose t' G wc 2 ( r )- If t' £ w £7i( r )) then in view of the fact that t G wc*i( r ) and t! >~c 2 t, 



we get a contradiction. On the other hand, if t' ^ wc^r), then by Theorem we get 



t' £ LJc 2 (r), a contradiction. rj 



Consider now what happens if the assumptions in Theorem 4.2 are relaxed. 



Example 4.2 Let Emp{EmpNo,Y ear Employed, Salary) be a relation schema. Define 
the following preference relations over it: 



and 



(e,y,s) y Cx (e',y',s') = s > s' 



(e,y,s) >-c 2 (e',y',s) =y <y'. 



Clearly, neither d => C 2 norC 2 C x . The database n = {(1, 1975, lOOfiQ, (2, 1980, 150K")}- 
Now 

LJ Cl (u C2 {r) = (1, 1975, 100JT)) + (2, 1980, 150#) = wciKW). 



Example 4.3 Consider the following preference relations: 

x ^Ci y=x=aAy=b 

and 

x >~c 2 y = x = a l\y = b\l x = b l\y = a. 
Clearly, C\ ^ C 2 - However, )~c 2 is a strict partial order. We have 

u Cl (uc 2 (r)) = 7^ {a} = wc 2 (wci(r)). 



In Theorem 4.2, if the preference formula C 2 is a pure comparison ipf in fc-DNF, then 
checking the validity of the formula Vti, t2[Ci(ii, t 2 ) =^ C2(£i>£2)] can be done in PTIME. 
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4.3.2 Commuting selection and winnow 



We identify in Theorem |4.3| below a sufficient and necessary condition under which the 
winnow operator and a relational algebra selection commute. This is helpful for pushing 
selections past winnow operators in preference queries. It is well known that moving se- 
lections down in the query tree reduces the size of (and the time needed to materialize) 
intermediate results and has a potential of enabling the use of indexes (if a selection is 
pushed all the way down to a database relation that has an index matching the selection 
condition). 

Theorem 4.3 Given a relation schema R, a selection condition C\ over R and a preference 
formula C 2 over R, if the formula 

Vti,t2[(Ci(t2)AC 2 (ti,t 2 ))=^C , 1 (t 1 )] 

is valid, then for all instances r of R: 

<rci(vca(r)) =uj C2 (o- Cl (r)). 

The converse holds under the assumption that >~c 2 ^ s irreflexive. 

Proof: We have that: 

t £ a Cl (^C 2 (r)) = t G r A C\(t) A {^3t'[t' G r A C 2 (t' , *)]). 

On the other hand: 

t G wcjfeW) = t G r A d(t) A {^3t'[t' G r A C 2 {t') A C 2 (t',t)}). 

Clearly, the first formula implies the second. To see that the opposite direction also holds, 
assume that there is a tuple to such that to G r and C 2 (to,t) holds. C\(t) holds, thus C\{to) 
holds too, since otherwise the formula Vii,i2[(Ci(i2) A C 2 {ti,t2)) =>• C\{ti)\ would not be 
valid. 

To see the necessity of the condition of the theorem, assume that there are tuples t\ and 
t 2 such that Ci(t 2 ) A C 2 {t 1 ,t 2 ) A ->Cx(t{). Then 

uic 2 (c- Cl ({h,t 2 })) = {t 2 } + = a Cl (^c 2 ({tiM))- 

The irreflexivity of >~c 2 is necessary to ensure that cJc7 2 ( cr Ci({*i!^2})) is nonempty. □ 
If the preference formula C 2 in Theorem 4.3 is a pure comparison ipf and the selection 
condition C\ is in /c-DNF and refers only to the arithmetic comparison predicates, then 
checking the validity of the formula V(Ci(t 2 ) A C 2 (t\, t 2 )) => Cx(t\) can be done in PTIME. 



Example 4.4 Consider the relation Book(ISBN, Vendor, Price) from Example 1.1. The 
preference relation >-d is defined as 

(i,v,p) y C i {i',v',p') = i = i' Ap<p'. 
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Consider the query <Jp r i ce< \^{oJCi (Book)). Now 

Vp,p',i,i'[(p' < 15 A i = i' Ap < p') => p < 15] 
is a valid formula, thus by Theorem |^.^ 

ucdo-priceKisiBook)) = o Price< i5 (uj Ci (Book) ) . 
On the other hand, consider the query 0\p r ice>i5- Then 

Vp,p',i,i'[(p' > 15 A i = i' Ap < p') p > 15] 

is not a valid formula, thus in this case the selection does not commute with winnow. Finally, 
the query o~isbn=c for any string c commutes with with uiCi(Book), because 

Vp,p', i, i 1 '[(i = c A i = i! A p < p') ^ i' = c] 

is a valid formula. 

4.3.3 Commuting projection and winnow 

We deal now with projection. For winnow to commute with projection, the preference 
formula needs to be restricted to the attributes in the projection. We denote by t[X] the 
tuple (t[-Ai], • • • , where X = A\ ■ ■ ■ Aj. is a set of attributes. 

Definition 4.1 Given a relation schema R, a set of attributes X of R, and a preference 
relation >~c over R, the restriction 9 x (>-c) of >~c to X is a preference relation >-q> defined 
using the following formula: 

u y c > u' = Vt, t'[(t[X] =uAt'[X] =u')^t ^ c t']. 

It is easy to see that if ^c* is a strict partial order, so is 9 x (>-c)- 

Theorem 4.4 Given a relation schema R, a set of attributes X of R, and a preference 
formula C over R, if the following formulas are valid: 

Vt^hMihlX] =t 2 [X]At 1 [X]^t 3 [X]At 1 y c t 3 )^t 2 y c t 3 ], 

Vh,t 3 ,U[(t 3 [X] =u[X]At 1 [X]^t 3 [X]At 1 y c t 3 )^h y c U], 
then for all instances r of R: 

iTx(^c(r)) = wc(nx(r)), 

where >~c ,= 9 x {>~c) is the restriction of^c to X. The converse holds under the assumption 
that >~c is irreflexive. 
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Proof: Assume u € irx(^c( r ))- Then there exists a tuple t € u>c(f) such that t[X] = u. 
Assume u ^ u>c'(^x(r))- Since u E irx(r), there exists a tuple u' € 7i"x(?") such that v! >~c' u 
and a tuple t' € r such that i'[AT] = u'. Since u' >~c' u, it has to be the case that t' >~c t, 
which contradicts the fact that t € uJc(r). 

For the opposite direction, assume that u £ wcfeW) an d u £ -Kx(^c( r ))- Then for 
each tuple t G r such that = it, there is another tuple t' € r such that t' ^ c t and 
i'f-X] ^ i[X]. By the assumption of the theorem, each tuple t' that dominates (in >~c) 
one tuple t such that t[X] = u, also dominates each such tuple. Also, any two tuples that 
agree on X dominate the same set of tuples. Therefore, if u' = t'[X], then u' >-c* u, which 
contradicts the fact that u G LUc'^xir))- 

To show the converse, assume that the first condition is violated, i.e., there are three 
tuples ti, t 2 and £3 such that ti[X] = t2[X], t\[X] ^ t 3 [X], t\ >~c t 3 and t 2 )fc ^3- Let 
ro = {h,t 2 ,t 3 }. Thent 3 ^ uj c (r ), so TTx(vc(r )) = {h[X}}. Now h[X] )/- c > h[X] (because 
t 2 ^ch) andti[X] ^t 3 [X]. Thus 

uc>{Kx(r)) = {h[X],t 3 [X]} + {h[X}} = Trxiujciro)). 

The violation of the second condition also leads to a contradiction in a similar way. rj 



If the preference formula C in Theorem 4.4 is a pure comparison ipf in /c-DNF then 



checking the validity of the assumption of this theorem can be done in PTIME. If C is a 
pure-comparison ipf, then C' can be presented in an equivalent, quantifier- free form. 

Example 4.5 Consider again the preference relation >~c\ from Example \1. j : 

(i,v,p) yci (i',v',p') = i = i! Ap < p 

over the relation schema Book(ISBN, Vendor, Price). Then the relation C = 6isBN,Price{>-c 1 ) 
is defined as 

(i,p) y c > (i',p') = yt,t'[(t[X] = (i,p) A t'[X] = {i',p')) =^ty Cl ^]=i = i' Ap< p'. 
This confirms the intuition that the projection does not affect this particular preference 



relation. It is easy to see that the condition of Theorem 4-4 * s a ^ so satisfied, so winnow 
commutes with projection in this case. 

4.3.4 Distributing winnow over Cartesian product 

For winnow to distribute (in a modified form) with the Cartesian product, the preference 
formula needs to be in a special form. The form turns out to be the Pareto composition, well 



known in multi-attribute utility theory [13]. Preference queries involving Pareto composition 



are quite common: the skyline queries []4j] without DIFF attributes are of this form. 

Definition 4.2 Given two relation schemas R\ and R 2 , a preference relation >~c 1 over R\ 
and a preference relation >~c 2 over R2, the Pareto composition P(yc 1 ,^C 2 ) °f ^Ci an d 
>~c 2 i> s a preference relation >~c over the Cartesian product R\ x R 2 defined as: 

(h,t 2 ) y Co {AA) = h >rd t'i A t 2 ^a 2 ? 2 A (h y Cl A v t 2 y C2 1' 2 ), 
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where 



x he y = x >-c y v x = y. 



Clearly, if >~d and >~c 2 are strict partial orders, so is P(>- ( 7 1 , >-c 2 ). 

Theorem 4.5 Given two relation schemas R\ and R2, a preference relation >-c x over R\ 
and a preference relation >~c 2 over R2, f or an V relations r\ and r 2 which are instances of 
R\ and R2, resp., the following property holds: 

MCoin x r 2 ) = u Cl (ri) x ujc 2 (r 2 ), 

where C = P(^Ci, ^C 2 )- 

Proof: Assume (£1^2) G wc (ri x r 2 ) but (ii,*2) ^ <^c*i(n) x w^^)- Then £i ^ wci(n.) 
or t 2 ^ wc 2 ( r 2)- Assume the first. Since (ti,t2) G ri x t2 and ii G ri, there must be a tuple 
t'i G ri such that t[ >~c 1 t\. Then the tuple (t'l,^) G ri x r2 and (i'i,i 2 ) >~c (^1,^2) which 
contradicts the fact that (ti,<2) G <^c ( r i x r 2)- The second case is symmetric. 

Assume now that (ti,*2) G wci(?"i) x ajc 2 (r2) and (ii,i2) ^ w C ( r i x r 2)- Then there is 
a tuple (t'i,t' 2 ) G ri x r2 such that (i'i,i 2 ) ^c (*i>*2)- Consequently, t' x ii or i 2 ^c 2 *2- 
Both cases lead to a contradiction with the fact that (ii , ^2) G ^Ci( r i) x u;c 2 (r2). □ 

We show now that a slight variation of the Pareto composition, even though it appears 
to be more natural, fails to achieve the distributivity of winnow over product. 

Example 4.6 Define a different composition >~c' of two preference relations >~Ci an d >~c 2 
as follows: 

{h,t 2 ) ^c' (*i>*2) = *i >~Ci t[ A£ 2 >-c 2 4- 
Consider the following preference relations: 

x >~d V = x >-c 2 V = x > y. 

Then if r\ = {1} and r 2 = {1,2}, then 

u Cl (n) x W c 2 (r 2 ) = {(l,2)}/{(l,l) 1 (l,2)}= Wc; (r 1 x r 2 ). 

4.3.5 Distributing winnow over union and difference 

It is possible to distribute winnow over union or difference only in the trivial case where the 
preference relation is an anti-chain. We call two relation schemas compatible if they have 
the same number of attributes and the corresponding attributes have the same domains. 

Theorem 4.6 Given two compatible relation schemas R and S and an irreflexive preference 
relation >-q over R, we have for every relation r and s 

uj c (r Us) = LO C {r) U loc(s) 

and 

uc(r - s) = uj c (r) - coc(s) 

if and only if >~c = 0- 
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Proof: Clearly, if >~c= and >~c is irreflexive, then 

u>c{r) U u>c(s) =rUs = u>c(r U s). 

To show that this is a necessary condition, assume that 0- Then there are two tuples 

t\ and t2 such that ti yc ^2- Now 

^c({ii,t 2 }) = {h} + {txM = wc({*i}) Uca c ({i 2 }). 
The proof for difference is similar. rj 

5 Composition of preferences 

Preference relations may be composed in many different ways. In general, we distinguish 
between multi- dimensional and uni- dimensional composition. In multi-dimensional com- 
position, we have a number of preference relations defined over several database relation 
schemas, and we define a preference relation over the Cartesian product of those relations. 
An example is Pareto composition (Definition Another example is lexicographic com- 
position. In uni-dimensional composition, a number of preference relations over a single 
database schema are composed, producing another preference relation over the same schema. 
Examples include: Boolean and prioritized composition (discussed below). 

Since in our framework preference relations are defined by first-order preference formu- 
las, any first-order definable composition of preference relations leads again to first-order 
preference formulas, which in turn can be used as parameters of the winnow operator. The 
composition does not even have to be first-order definable, as long as it produces a (first- 
order) preference formula. We'll see an example of the latter later in section when we discuss 
transitive closure. 

5.1 Boolean composition 

Union, intersection and difference of preference relations are obviously captured by the 
Boolean operations on the corresponding preference formulas. For example, the following 
formula captures the preference ^c = ^Ci H >~c 2 - 

x yc y = x ycx y a x >-c 2 V- 

Table |l] summarizes the preservation of properties of relations by the appropriate Boolean 
composition operator. 

5.2 Preference hierarchies 

It is often the case that preferences form hierarchies. For instance, I may have a general 
preference for red wine but in specific cases, e.g., when eating fish, this preference is over- 
ridden by the one for white wine. Also a preference for less expensive books (Example |l.l[) 
can be overridden by a preference for certain vendors. 
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Union 


Intersection 


Difference 


Irreflexivity 


Yes 


Yes 


Yes 


Asymmetry 


No 


Yes 


Yes 


Transitivity 


No 


Yes 


No 



Table 1: Properties Preserved by Boolean Composition 



Definition 5.1 Consider two preference relations >~Ci an d ^c 2 defined over the same 
schema U. The prioritized composition ^Ci 2 = ^Ci t> >~c 2 °f ^c*i and >~Ci is defined 
as: 

h >-c h2 h = h >~ch h V (*a i-(h h A h y C2 h)- 

The prioritized composition >~Ci t> ^c 2 has the following intuitive reading: prefer 
according to >~c 2 unless >~c 1 is applicable. 



Example 5.1 Continuing Example 1.1, instead of the preference relation yQ defined there 
as follows: 

(i,v,p) y Cl (i',i/,p') = i = i' Ap < p', 
we consider the relation >~c t> where >~c is defined by the following formula Cq: 
(i, v,p) >~c (* j v ' iP') = i = i' Av = 'BooksForLess' A v = 'LowestPrices'. 

Assume the preference relation ^c i = ^c*o ^ ^~Ci (the definition ofyc i is easily obtained 
from the formulas Cq and C\ by substitution). Then u;c 01 (ri) returns the following tuples 



ISBN 


Vendor 


Price 


0679726691 
0062059041 
0374164770 


BooksForLess 
BooksForLess 
LowestPrices 


$14.75 

$7.30 

$21.88 



Note that now a more expensive copy of the first book is preferred, due to the preference for 
'BooksForLess' over 'LowestPrices'. However, 'BooksForLess' does not offer the last book, 
and that 's why the copy offered by 'LowestPrices ' is preferred. 



Theorem 5.1 If >~Ci an d ^c 2 are preference relations, so is >~Ci2- tf >~Ci and ^c 2 are 
both irreflexive or asymmetric, so is >~C\2- 

However, a relation defined as the prioritized composition of two transitive preference 
relations does not have to be transitive. 

Example 5.2 Consider the following preference relations: 

a y Cl b, b y C 2 c 

Both >~C\ an d >-c 2 are trivially transitive. However, >~C\ l> ^C 2 is not. 
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Theorem 5.2 Prioritized composition is associative: 

(Vd > ^c 2 )> ^c 3 =^c, >(^c 2 > >- C 3 ) 
and distributes over union: 

ych >^C 2 U y C3 ) = > ^C 2 ) U (^d > 

Thanks to the associativity and distributivity of >, the above construction can be gen- 
eralized to an arbitrary finite partial priority order between preference relations. Such an 
order can be viewed as a graph in which the nodes consist of preference relations and the 
edges represent relative priorities (there would be an edge (Vcd >-c 2 ) m the situation de- 
scribed above). To encode this graph as a single preference relation, one would construct 
first the definitions corresponding to individual paths from roots to leaves, and then take a 
disjunction of all such definitions. 

There are many other ways of combining preferences. For instance, the paper Q defines 
an infinite family of uni-dimensional composition operators for preference relations on the 
basis of two basic operators. Since all the definitions are first-order, every preference relation 
defined in the framework of |Hj can also be defined in ours. In ||, it is proved that the 
operators in the defined family exhaust all operators satisfying a number of intuitively 
plausible postulates. It turns out that the operator D> defined above cannot be captured 
in the framework of Q, because it violates one of those postulates: it does not preserve 
transitivity. 

5.3 Transitive closure 

We address here the issue of transitively closing a preference relation. We have seen an 
example (Example |1.1| ) of a preference relation that is already transitive. However, there 
are cases when we expect the preference relation to be the transitive closure of another 
preference relation which is not transitive. 

Example 5.3 Consider the following relation: 

xyy = x = aAy = b\/x = bAy = c. 

In this relation, a and c are not related though there are contexts in which this might be 
natural. (Assume I prefer to walk than to drive, and to drive than to ride a bus. Thus, I 
also prefer to walk than to ride a bus.) 

In our framework, we can specify the preference relation yc* to be the transitive closure 
of another preference relation >- defined using a first-order formula. This is similar to 
transitive closure queries in relational databases. However, there is an important difference. 
In databases, we are computing the transitive closure of a finite relation, while here we are 
transitively closing an infinite relation defined using a first-order formula. 
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Definition 5.2 The transitive closure of a preference relation >~c over a relation schema 
R is a preference relation >~c* over R defined as: 

t\ >~c* t 2 iff t\ >-q t 2 for some n > 0, 

where: 

h >~c *2 = ^i h 

h ^c +1 t 2 = 3t 3 . t x y c t 3 A t 3 >~ n c t 2 . 



Clearly, in general Definition 5.2 leads to infinite formulas. However, as Theorem 5.3 
shows, in many important cases the preference relation >~c* will in fact be defined by a 
finite formula. 

Theorem 5.3 If a preference relation yc is defined using a pure comparison ipf, the tran- 
sitive closure >~c* of >-q is also defined using a pure comparison ipf and that definition can 
be effectively obtained. 

Proof: The computation of the transitive closure can in this case be formulated as the 
evaluation of Datalog with order or gap-order (for integers) constraints. Suppose >~c is 
defined as: 

x >- c V = cti(x, y) V • • • V a n (x, y). 
Then the Datalog program that computes the formula C* defining >~c* looks as follows: 

T(x,y) <- ai(x,y). 

T(x,y) <- a n (x,y). 
S{x,y) <- T(x,y). 
S{x,y) <- T(x,z),S(z,y). 

The evaluation of this program terminates |[l|, ^] and its result, collected in 5, represents 
the desired formula. 

□ 

An analogous result holds if instead of arithmetic comparisons we consider equality 
constraints over an infinite domain pl|. 



Example 5.4 Continuing Example \5. 8 , we obtain the following preference relation ^c* by 
transitively closing >~c: 

x >~c* y = x = a/\y = bVx = bAy = c\/x = aAy = c. 



Theorem 5.3 is not in conflict with the well-known non-first order definability of tran- 
sitive closure on finite structures. In the latter case it is shown that there is no finite 
first-order formula expressing transitive closure for arbitrary (finite) binary relations. In 



Theorem 5^ the relation to be closed, although possibly infinite, is fixed (since it is defined 
using the given ipf). In particular, given an encoding of a fixed finite binary relation using 
an ipf, the transitive closure of this relation is defined using another ipf. 

The transitive closure of a irreflexive (resp. asymmetric) preference relation may fail to 
be irreflexive (resp. asymmetric). 
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6 Applications and extensions 



We show here how to use winnow to express special classes of preference queries: skylines and 
queries involving scoring functions, and how to use winnow together with other operators 
of the relational algebra to express more complex decision problems involving preferences. 
We consider the following: integrity constraints, extrinsic preferences, and aggregation. 



6.1 Special classes of preference queries 
6.1.1 Skylines 

Skyline queries Q find all the tuples in a relation that are not dominated by any other tuples 
in the same relation in all dimensions. This is exactly the notion of Pareto composition 



(Definition 4.2) in an arbitrary number of dimensions. 

Figure ^ shows an example of a two-dimensional skyline where the dominance relation- 
ship is >. The skyline elements are marked with thick black dots. 



1 2 3 4 5 
Figure 4: Two-dimensional skyline 

0] propose to write skyline queries using the following extension to SQL: 

SELECT . . . FROM . . . WHERE . . . 

GROUP BY . . . HAVING . . . 

SKYLINE OF Al [MIN | MAX | DIFF] 

An [MIN | MAX | DIFF] 

The values of a MIN attribute are minimized, those of a MAX attribute maximized. A 
DIFF attribute indicates that tuples with different values of that attribute are incomparable. 
The SKYLINE clause is applicable after all other SQL clauses. 

Clearly, skylines can be expressed using winnow. The winnow is applied to an SQL 
view that expresses the non-skyline constructs in a skyline query. The preference formula 
is easily obtained from the SKYLINE clause. For example: 

SKYLINE OF A DIFF, B MAX, C MIN 

in a relation R is equivalent to ivc(R) where 

(x, y, z) >-c {%', y' , z') = x = x' A y > y' A z < z' A (y > y' V z < z'). 
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Finally, we note that u)c x (Book) from Example 2.1 is also a skyline query in which the 
skyline clause looks as follows: 



SKYLINE OF ISBN DIFF, PRICE MIN. 



6.1.2 Queries involving scoring functions 

Sometimes a relation schema R comes with a scoring function that associates a value fit) 
with every tuple t in a possible instance of R. Now finding the tuples that maximize a 
scoring function / can be done by computing uJc f (r) for the given instance r of R, where 

ty Cf t' = f(t)>f(t'). 

This approach can be generalized to compute not only the top scoring tuples but also 
those whose score differs from the top score by at most a given value or a given percentage. 
For example, the tuples that differ from the top score by at most d are computed by 
uc f ^ d (r), where 

ty Cf . d t' = f(t)-d> f(t'). 

Queries that return the tuples with top-N scores ]7j] can also be captured using winnow 
together with SQL, using the approach described later in this section. Essentially, for each 
tuple t we will determine using SQL the number n(t) of tuples with higher scores than t 
and use the expression N — n(t), where is the number of tuples in the relation, to define a 
new scoring function. This function is then used to define a preference relation as in the 
preceding paragraph. It appears, however, that in terms of the efficiency of query evaluation 
this approach will be inferior to the approach in which top-N queries are supported directly 
by the query engine. 

Formally,we say that a real-valued function / over a schema R represents a preference 
relation >~c over R iff 

Vti,i 2 [h y c t 2 iff /(ti) > f(t 2 ). 

As pointed out earlier, not every preference relation which is a strict partial order can be 
expressed using a scoring function. A necessary condition is that the relation be a weak 



order [13]. We can ask for the motivation behind this notion of representation. It is easy 



to show that 

Theorem 6.1 A real-valued function f represents a preference relation >~c iff for every 
finite instance r of R, the setujc( r ) is equal to the set of tuples of r assuming the maximum 
value of u. 

Thus, if a scoring function does not represent a preference relation, that fact can be detected 
by winnow evaluated over some instance. 
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There are other, weaker forms of representation of preference relations by scoring func- 
tions. For instance, if we only require that 

Vti,t 2 [h y c t 2 ^ /(ii)> /(t 2 ), 

then for every strict partial order there is a scoring function representing it. However, in 
this case we can only guarantee that the set of tuples in a given instance r that maximize 
/ is a subset of uic{r). 



6.2 Integrity constraints 

There are cases when we wish to impose a constraint on the result of the winnow operator. 
In Example we may say that we are interested only in the books under $15. In Example 



2.2| , we may restrict our attention only to the meat or fish dishes (note that currently the 
dishes that are not meat or fish do not have a preferred kind of wine). In the same example, 
we may ask for a specific number of meal recommendations. 

In general, we need to distinguish between local and global constraints. A local constraint 
imposes a condition on the components of a single tuple, for instance Book.Price<$15. A 
global constraint imposes a condition on a set of tuples. The first two examples above 
are local constraints; the third is global. To satisfy a global constraint on the result of the 
winnow operator, one would have to construct a maximal subset of this answer that satisfies 
the constraint. Since in general there may be more than one such subset, the required 
construction cannot be described using a single relational algebra query. On the other hand, 
local constraints are easily handled, since they can be expressed using selection. In general, 
it matters whether the selection is applied before or after the winnow operator. Theorem 



4.3 identifies sufficient and necessary conditions for winnow and selection to commute. 



Example 6.1 Consider the situation where we have a specific preference ordering for cars, 
e.g., prefer BMW to Chevrolet, but also have a limited budget (captured by a selection 
condition). Then clearly, selecting the most desirable affordable car will not give the same 
result as selecting the most desirable cars if they are affordable. 



A veto expresses a prohibition on the presence of a specific set of values in the elements 
of the answer to a preference query and thus can be viewed as a local constraint. To veto a 
specific tuple w = (a\, . . . , a n ) in a relation S (which can be defined by a preference query) 
of arity n, we write the selection: 

0"^i^a 1 V---V J 4 n ^a n (5')- 



6.3 Intrinsic vs. extrinsic preferences 

So far we have talked only about intrinsic preference formulas. Such formulas establish the 
preference relation between two tuples purely on the basis of the values occurring in those 
tuples. Extrinsic preference formulas may refer not only to built-in predicates but also to 
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other constructs, e.g., database relations. In general, extrinsic preferences can use a variety 
of criteria: properties of the relations from which the tuples were selected, properties of 
other relations, or comparisons of aggregate values, and do not even have to be defined 
using first-order formulas. 

It is possible to express some extrinsic preferences using the winnow operator together 
with other relational algebra operators using the following multi-step strategy: 

1. using a relational query, combine all the information relevant for the preference in a 
single relation, 

2. apply the appropriate winnow operator to this relation, 

3. project out the extra columns introduced in the first step. 

The following example demonstrates the above strategy, as well as the use of aggregation 
for the formulation of preferences. 

Example 6.2 Consider again the relation Book(ISBN, Vendor, Price). Suppose for each 
book a preferred vendor (there may be more than one) is a vendor that sells the maximum 
total number of books. Clearly, this is an extrinsic preference since it cannot be established 
solely by comparing pairs of tuples from this relation. However, we can provide the required 
aggregate values and connect them with individual books through new, separate views: 

CREATE VIEW BookNum (Vendor , Num) AS 

SELECT Bl. Vendor, COUNT (DISTINCT Bl.ISBN) 

FROM Book Bl 

GROUP BY Bl. Vendor; 

CREATE VIEW ExtBook (ISBN, Vendor , Num) AS 
SELECT Bl.ISBN, Bl. Vendor, BN.Num 
FROM Book Bl, BookNum BN 
WHERE Bl . Vendor=BN . Vendor ; 

Now the extrinsic preference is captured by the query 

iriSBN,Vendor(uc B (ExtBook)) 

where the preference formula C$ is defined as follows: 

(i, v, n) ^5 {i , v , n) = i = i A n > n. 

Example 6.3 To see another example of extrinsic preference, consider the situation in 
which we prefer any tuple from a relation R over any tuple from a relation S which is 
disjoint from R. Notice that this is truly an extrinsic preference, since it is based on where 
the tuples come from and not on their values. It can be handled in our approach by tagging 
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the tuples with the appropriate relation names (easily done in relational algebra or SQL) 
and then defining the preference relation using the tags. If there is a tuple which belongs both 
to R and S, then the above preference relation will fail to be irreflexive and the simulation 
using intrinsic preferences will not work. Note also that an approach similar to tagging was 
used in Example \2.2\ (wine and dish types play the role of tags). 



Example 6.4 Suppose user preferences are stored in a database relation Pref(A,B). Then 
once can define an extrinsic preference relation: 

x ypref V = Pref(x, y). 

Such a preference relation cannot be defined using a pure comparison ipf, because the tran- 



sitive closure of a preference relation defined using an ipf is finite (Theorem 5.S), while that 
of >~ p re f is infinite. 



7 Iterated preferences and ranking 

We show here that the framework presented so far can be further developed to capture other 
preference-related concepts like ranking. We also present a variant of winnow suitable to 
preference relations that are not partial orders. 



7.1 Ranking 

A natural notion of ranking is implicit in our approach. A ranking is defined using iterated 
preference. 

Definition 7.1 Given a preference relation y defined by a pf C , the n-th iteration of the 
winnow operator ujq in r is defined as: 

u l c {r) = uj c (r) 

^ +1 (r)=u c (r-U 1 < i < n ioUr)) 
For example, the query to^(r) computes the set of "second-best" tuples. 
Example 7.1 Continuing Example the query {Jq^(t\) returns 



ISBN 


Vendor 


Price 


0679726691 


BooksForLess 


$14.75 



and the query lJq (n ) returns 



ISBN 


Vendor 


Price 


0679726691 


QualityBooks 


$18.80 
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Therefore, by iterating the winnow operator one can rank the tuples in a given relation 
instance. 

Theorem 7.1 If a preference relation >~c over a relation schema R is a strict partial order, 
then for every finite instance r of R and every tuple t £ r, there exists an i, i > 1, such 
that t G uj l c (r). 

Proof: Assume there is a tuple to £ wc( r ) such that for all i > 1, to £ oj 1 c {t). Select the 
least iq such that Vi > i a , u> l c (r) = (such an i always exists due to the finiteness of r). 
Clearly, to ^ u^(r), thus t G r — Ui<i<i -l tjJ h( r )- Then there must be a tuple t\ such that 
h to and t\ G r — Ui<£<i -i w c'( r ) (otherwise to G w^(r)). Since is a strict partial 
order, there has to be an infinite increasing chain in r, a contradiction with the finiteness 
of r. n 
We define now the ranking operator rjc(R)- 

Definition 7.2 If R is a relation schema and C a preference formula defining a preference 
relation >~c over R, then the ranking operator is written as r]c(R), and for every instance 
r of R: 

ric(r) = {(t,i)\teu i c (r)}. 

One can now study the algebraic properties of the ranking operator, that parallel those 
that we established for winnow in Section |j. We list here only one property which is the 
most important one from a practical point of view: commutativity of selection with ranking. 
In this context, ranking enjoys identical properties to winnow. 

Theorem 7.2 Given a relation schema R, a selection condition C\ over R and a preference 
formula C2 over R, if the formula 

Vti,t 2 [(Ci(t 2 ) AC 2 (t 1 ,t 2 ))^C 1 (t 1 )] 

is valid, then for all instances r of R: 

o-d(vc 2 (r)) = ?7C 2 (o-Ci(r)). 

The converse holds under the assumption that is irreflexive. 



Proof: The proof is by induction on tuple rank. The base case follows from Theorem [O 
and the inductive case from the observation that 

a Cl (" n ct 1 (r)) = a Cl ("c 2 (r- (J w&r))) = co C2 (a Cl (r - \J ^(r))) 

l<i<n l<i<n 

which is equal to 

UGi{<TCi{r) - o- Cl ( |J u l c (r))) = u;c 2 (o- Cl (r) - \J a Cl (^c( r )))- 

l<i<n l<i<n 

under the assumptions of the theorem. rj 
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7.2 Weak winnow 



If a preference relation is not a strict partial order, then Theorems |3.2| and |7.1| may fail to 
hold. A number of tuples can block each other from appearing in the result of any iteration 
of the winnow operator. However, even in this case there may be a weaker form of ranking 
available. 



Example 7.2 Consider Examples 1.1 and 5.1. If the preference formula C is defined as 



Co V C\, then the first two tuples of the instance n block each other from appearing in the 
result of u)qi (r% ) , since according to Cq the first tuple is preferred to the second but just the 
opposite is true according to C\. Intuitively, both those tuples should be preferred to (and 
ranked higher) than the third tuple. But since neither the first not the second tuple is a 
member of loq 1 (t\ ) , none of the first three tuples can be ranked. 

To deal with preference relations that are not strict partial orders, we define a new, 
weaker form of the winnow operator. We relax the asymmetry and irreflexivity requirements 
but preserve transitivity. 

To define this operator, we notice that as long as the preference relation >~c is transitive, 
we can use it to define another preference relation >-c> which is a strict partial order: 

x >-c> y = x >-c y Ay Y-c x. 



Definition 7.3 If R is a relation schema and >~c a transitive preference relation over 
R, then the weak winnow operator is written as ipc{R) an d f or every instance r of R, 
tpc(r) = u) C> {r). 

It follows from the definition that 

ijj c (r) ={ter\W £r.ty c t'v t' )/- c t}. 

Thus the weak winnow operator returns all the tuples that are dominated only by the tuples 
that they dominate themselves. 



Example 7.3 Considering Example 7.H, we see that the query ipc' i r i) returns now 



ISBN 


Vendor 


Price 


0679726691 
0619126691 
0062059041 
0374164770 


BooksForLess 
LowestPrices 
BooksForLess 
LowestPrices 


$14.75 
$13.50 
$7.30 
$21.88 



Below we formulate a few properties of the weak winnow operator. Using Theorems I3J3 
and |3.2| (notice that C> => C), we immediately obtain the following theorem. 



Theorem 7.3 If R is a relation schema and >~c a transitive preference relation over R, 
then: 
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• for every instance r of R, toc( r ) Q V'cM- 

• for every finite, nonempty relation instance r of R, ipc{ r ) * s nonempty. 

One can define the iteration of the weak winnow operator similarly to that of the winnow 
operator (Definition 7.1). 



Theorem 7.4 If a preference relation yc over a relation schema R is transitive, then for 
every finite instance r of R and for every tuple t € r, there exists an i, i > 1, such that 

8 Related work 

8.1 Preference queries 



[24] originated the study of preference queries. It proposed an extension of the relational 
calculus in which preferences for tuples satisfying given logical conditions can be expressed. 
For instance, one could say: Among the tuples of R satisfying Q, I prefer those satisfying 
Pi; among the latter I prefer those satisfying P 2 . Such a specification was to mean the 
following: Pick the tuples satisfying Q A Pi A P 2 ; if the result is empty, pick the tuples 
satisfying Q A P\ A -P 2 ; if the result is empty, pick the remaining tuples of R satisfying Q. 
This can be simulated in our framework as the relational algebra expression loc*{o~q{R)) 
where C* is an ipf defined in the following way: 

1. obtain the formula C defining a preference relation y 

hyt 2 = P^h) A P 2 (h) A Pi(t 2 ) A -P 2 (i 2 ) V Pi(ti) A -P 2 (ti) A -Pi(t 2 ), 

2. transform C into DNF to obtain an ipf C', and 

3. close the result transitively to obtain an ipf C* defining a transitive preference relation 
y* (as described in Section ||). 

Other kinds of logical conditions from p4|] can be similarly expressed in our framework. 



Maximum/minimum value preferences (as in Example LI) are handled in [24] through the 
explicit use of aggregate functions. The use of such functions is implicit in the definition of 
our winnow operator. 

Unfortunately, ]^4| does not contain a formal definition of the proposed language, so a 
complete comparison with our approach is not possible. It should be noted, however, that 



the framework of [24] seems unable to capture very simple conditional preferences like the 



ones in Examples 2.2 and 5.3. Also, it can only handle strict partial orders of bounded depth 



(except in the case where aggregate functions can be used, as in Example LI). Hierarchical 
or iterated preferences are not considered. 



[16] was one of the sources of inspiration for the present paper. It defines Preference 



Datalog: a combination of Datalog and clausally-defined preference relations. Preference 
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Datalog captures, among others, the class of preference queries discussed in [g4|. The declar- 
ative semantics of Preference Datalog is based on the notion of preferential consequence, 
introduced earlier by the authors in [15]. This semantics requires preferences to be reflexive 
and transitive. Also, the operational semantics of Preference Datalog uses specialized ver- 
sions of the standard logic program evaluation methods: bottom-up fl6[| or top-down ]l5fl . 
In the context of database queries, the approach proposed in the present paper achieves 
similar goals to that of [15] and jl6||, remaining, however, entirely within the relational data 



model and classical first-order logic. Finally, Jig , 16 1 do not address some of the issues we 
deal with in the present paper like transitive closure of preferences, prioritized composition 
or iterated preferences (a similar concept to the last one is presented under the name of "re- 
laxation"). More importantly, the issues of embedding the framework into a real relational 
query language and optimizing preference queries are not addressed. 

[22, 23] propose an (independently developed) framework similar to the one presented 
in this paper and in [H]. A formal language for formulating preference relations is described. 
The language has a number of base preference constructors and their combinators (Pareto 
and lexicographic composition, intersection, disjoint union and others). Clearly, all of those 
can be captured in our framework. On the other hand, |22] , p3| ] do not consider the pos- 
sibility of having arbitrary operation and predicate signatures in preference formulas, and 
do not identify any specific classes of preference formulas. Neither do they consider ex- 
trinsic preferences, complex preferences involving aggregation, or ranking. However, the 
embedding into relational query languages they use is identical to ours (it is called Best 
Match Only, instead of winnow). While some possible rewritings for preference queries are 
presented in |2~^], a bstract properties of winnow that we described in Section [|] are not iden- 
tified. Finally, j23[] describes an implementation of the framework of |2^| using a language 
called Preference SQL, which is translated to SQL, and several deployed applications. 

0] introduces the skyline operator and describes several evaluation methods for this 
operator. As shown in Section ^, skyline is a special case of winnow. It is restricted to 
use a pure comparison ipf which is a conjunction of pairwise comparisons of corresponding 
tuple components. So in particular Example [T^ does not fit in that framework. Some 
examples of possible rewritings for skyline queries are given but no general rewriting rules 
are formulated. 

uses quantitative preferences in queries and focuses on the issues arising in combining 
such preferences. [19] explores in this context the problems of efficient query processing. 
Since the preferences in this approach are based on comparing the scores of individual tuples 
under given scoring functions, they have to be intrinsic. However, the simulation of extrinsic 
preferences using intrinsic ones (Section |6|) is not readily available in this approach because 
the scoring functions are not integrated with the query language. So, for instance, Example 



6.2| cannot be handled. In fact, even for preference relations that satisfy the property 
of transitivity of the corresponding indifference relation, it is not clear whether the scoring 
function capturing the preference relation can be defined intrinsically (i.e., the function value 
be determined solely by the the values of the tuple components) . The general construction 
of a scoring function on the basis of a preference relation |jl2| , [l3| does not provide such a 
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definition. So the exact expressive power of the quantitative approach to preference queries 
remains unclear. 



8.2 Preferences in logic and artificial intelligence 



The papers on preference logics 25, 18] address the issue of capturing the common- 
sense meaning of preference through appropriate axiomatizations. Preferences are defined 
on formulas, not tuples, and with the exception of pq , [l0[| limited to the propositional case. 
[25] proposes a modal logic of preference, and [10] studies preferences in the context of 



relation algebras. The application of the results obtained in this area to database queries 
is unclear. 

The papers on preference reasoning [30, |28|, ||| attempt to develop practical mechanisms 



for making inferences about preferences and solving decision or configuration problems 



similar to the one described in Example |2.2| . A central notion there is that of ceteris paribus 
preference: preferring one outcome to another, all else being equal. Typically, the problems 
addressed in this work are propositional (or finite-domain). Such problems can be encoded 
in the relational data model and the inferences obtained by evaluating preference queries. 
A detailed study of such an approach remains still to be done. We note that the use of a 
full-fledged query language in this context makes it possible to formulate considerably more 
complex decision and configuration problems than before. 

The work on prioritized logic programming and nonmonotonic reasoning [§, |ll|, ^] has 
potential applications to databases. However, like it relies on specialized evaluation 
mechanisms, and the preferences considered are typically limited to rule priorities. 

9 Conclusions and future work 

We have presented a framework for specifying preferences using logical formulas and its 
embedding into relational algebra. As the result, preference queries and complex decision 
problems involving preferences can be formulated in a simple and clean way. 

Clearly, our framework is limited to applications that can be entirely modeled within the 
relational model of data. Here are several examples that do not quite fit in this paradigm: 

• preferences defined between sets of elements; 

• heterogenous preferences between tuples of different arity or type (how to say I prefer 
a meal without a wine to a meal with one in Example |2.2[?); 



preferences requiring nondeterministic choice. We believe this is properly handled 



using a nondeterministic choice 14] or witness [|[] operator. 
In addition to addressing the above limitations, future work directions include: 

• evaluation and optimization of preference queries, including cost-based optimization; 

• extrinsic preferences; 
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• defeasible and default preferences; 

• preference elicitation. 
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